Predicting Document Coverage for Relation Extraction

نویسندگان

چکیده

Abstract This paper presents a new task of predicting the coverage text document for relation extraction (RE): Does contain many relational tuples given entity? Coverage predictions are useful in selecting best documents knowledge base construction with large input corpora. To study this problem, we present dataset 31,366 diverse 520 entities. We analyze correlation features like length, entity mention frequency, Alexa rank, language complexity, and information retrieval scores. Each these has only moderate predictive power. employ methods combining statistical models TF-IDF BERT. The model BERT, HERB, achieves an F1 score up to 46%. demonstrate utility on two use cases: KB claim refutation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Document Summarisation Using Generic Relation Extraction

Experiments are reported that investigate the effect of various source document representations on the accuracy of the sentence extraction phase of a multidocument summarisation task. A novel representation is introduced based on generic relation extraction (GRE), which aims to build systems for relation identification and characterisation that can be transferred across domains and tasks withou...

متن کامل

Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction

Here, we address the task of assigning relevant terms to thematically and semantically related sub-corpora and achieve superior results compared to the baseline performance. Our results suggest that more reliable sets of keyphrases can be assigned to the semantically and thematically related subsets of some corpora if the automatically determined sets of keyphrases for the individual documents ...

متن کامل

Collective Cross-Document Relation Extraction Without Labelled Data

We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). F...

متن کامل

Global Relation Embedding for Relation Extraction

Recent studies have shown that embedding textual relations using deep neural networks greatly helps relation extraction. However, many existing studies rely on supervised learning; their performance is dramatically limited by the availability of training data. In this work, we generalize textual relation embedding to the distant supervision setting, where much largerscale but noisy training dat...

متن کامل

Large-Coverage Root Lexicon Extraction for Hindi

This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2022

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00456